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LOG  LINEAR  MODEL  APPLICATIONS 


Herbert  Solomon 
Stanford  University 


Methods  for  multivariate  data  analysis  have  been  developed  and 
received  wide  usage  in  the  past  30  years.  The  advent  of  the  computer 
in  the  early  1950's  made  possible  an  acceleration  in  the  number  of  multi¬ 
variate  studies.  The  first  to  profit  was  factor  analysis  which  became  a 
commonplace  in  exploratory  studies.  Subsequently,  linear  discriminant 
analysis,  classification  methods  and  then  cluster  analyses  became  part 
of  the  emerging  excitement  in  attempting  and  doing  studies  either  not 
considered  possible  before  or  not  thought  about  at  all  before  because 
of  the  lack  of  computer  resources.  Another  multivariate  method  has  also 
profited  from  the  computer  era.  It  is  multidimensional  contingency  table 
analysis  through  use  of  the  log-linear  model.  Very  recently  a  two  part 
survey  and  discussion  of  the  literature  of  log-linear  and  logistic  cate¬ 
gorical  data  modeling  appeared  in  two  issues  of  the  International  Statis¬ 
tical  Review  authored  by  Imrey,  Koch,  and  Stokes  [5]  and  [6]. 

S's.. _ —  ^ 

L  »’  f  r- 

In  this  exposition  we  will  look^at  some  applications  of  the  log- 
linear  model,  or  as  it  sometimes  is  called,  logistic  response  analysis. 

It  may  be  Instructive  to  view  one  situation  explored  about  25  years  ago 
through  classification  techniques  before  computer  capability  to  handle 
the  log-linear  model  and  software  was  available,  and  then  analyze  the 
same  data  base  by  contingency  table  analysis.  After  this  comparison,  -we-* 
will  discuss  other  and  more  recent  direct  applications  of  the  log-linear 
models  *  '•  ri  -  ' 

This  report  is  based  on  an  invited  talk  delivered  at  the  First  Pacific  Area 
Statistical  Conference,  "Recent  Developments  in  Statistical  Theory  and  Data 

Analysis"  held  in  Tokyo,  Japan  in  December  1982.  The  Conference  was  sponsored 
by  the  Pacific  Statistical  Institute  and  endorsed  by  the  Institute  of  Statisti¬ 
cal  Mathematics  (Tokyo),  the  American  Statistical  Association,  and  the  Statis¬ 
tical  Society  of  Australia. 


Classification  *nd  Log-Llnea r  Models 


Zn  attempt  to  axaalna  claaalf icatlon  procedures  for  a  set  of 
dichotomous  variables  for  which  interactions  of  higher  order  were  not 
assumed  to  be  zero,  that  is,  conditions  that  could  violate  the  usual  normal¬ 
ity  assumptions  in  discriminant  analysis  and  classification  techniques 
available  in  the  1950's,  Solomon  [8]  reports  on  some  conclusions  for  a  data 
base  on  attitudes  toward  science  elicited  from  2982  high  school  seniors 
in  New  Jersey.  This  data  base  is  then  re-examined  by  Gokhale  and  Rollback 
[3]  employing  a  log-linear  model.  Six  items  on  a  1957  questionnaire  "Atti¬ 
tudes  Toward  Science  and  Scientific  Careers"  were  chosen  on  the  basis  of 
good  discrimination  between  high  and  low  IQ  as  measured  on  a  brief  IQ 
vocabulary  test.  This  led  to  two  groups  each  containing  1491  students. 

The  scoring  was  such  that  a  student  either  'agreed',  recorded  as  1,  or 
'disagreed',  recorded  as  0,  with  the  items  listed  below 

x^  The  development  of  new  ideas  is  the  scientist's  greatest  source 
of  satisfaction. 

X.  Scientists  and  engineers  should  be  eliminated  from  the  military 
draf  t . 

x-  The  scientist  will  make  his  maximum  contribution  to  society  when 
he  has  freedom  to  work  on  problems  which  interest  him. 

x.  The  monetary  compensation  of  a  Nobel  Prize  winner  in  physics  should 
be  at  least  equal  to  that  given  popular  entertainers. 

x.  The  free  flow  of  scientific  information  among  scientists  is 
essential  to  scientific  progress. 

x.  The  neglect  of  basic  scientific  research  would  be  the  equivalent 
of  "killing  the  goose  that  laid  the  golden  eggs." 


For  some  of  the  proposed  analyses,  even  the  use  of  six  Items  proved  to  be 
cumbersome  for  the  computing  recourees  available  at  Columbia  University 
about  1958  so  that  only  the  first  four  of  the  six  items  above  were  used  in 
several  studies.  The  frequency  distributions  for  the  four  Items  are 
given  In  Table  1. 

One  of  the  purposes  of  the  study  was  to  contrast  the  effectiveness 
of  the  sum  of  Item  responses  with  the  use  of  the  total  response  vector, 
that  Is  allow,  for  example,  for  the  added  information  In  employing  the 
vector  (1101)  in  a  classification  procedure  rather  than  just  the  score 
value  equal  to  three.  For  classification  procedures,  other  aggregates 
of  Information  to  represent  attitudes  may  be  developed  that  fall  between 
the  total  response  vector  and  the  sum  that  may  give  less  Information  than 
the  former  and  more  than  the  latter.  One  way  to  quantify  this  is  to  exploit 
a  representation  of  the  joint  distribution  of  responses  for  n  dichotomous 
items  developed  by  Bahadur  [1}.. 


Let  X  denote  the  set  of  all  points  x  "  (x^.x^, . . . ,xq) ,  each 
x^  «  0  or  1  and  let  p(x)  be  a  given  probability  distribution  on  X. 

For  each  i  -  l,2,...,n,  let  ■  Prtx^-l)  -  E(xj),  that  is,  the 
represent  the  marginal  frequencies  of  the  x^'s.  Label 

the  joint  probability  distribution  of  the  x^  when  the  x^'s  are  Indepen¬ 
dently  distributed  and  they  have  the  same  marginal  distributions  as  under 


the  given  pfa^.x^, . . . .x^) .  Then  we  may  write 


n  x^  1-x^ 

^[1]  ^xi,x2*  *  *  *  »xn>  "  n  ai 


p(X>  -  P  r I  I (x)  •  f(x) 
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Dlat 


TABLE  1 


JOINT  DISTRIBUTION  OF  RESPONSES  ON  ATTITUDE  TO  SCIENCE 


Low  IQ 

High  IQ 

*1 *X2*X3*X4 

Frequency 

Relative  frequency 

Frequency 

Relative  frequency 

mi 

62 

.042 

122 

.082 

1110 

70 

.047 

68 

.046 

1101 

31 

.021 

33 

.022 

1100 

41 

.027 

25 

.017 

1011 

283 

.190 

329 

.221 

1010 

253 

.170 

247 

.166 

1001 

200 

.134 

172 

.115 

1000 

305 

.205 

217 

.146 

0111 

14 

.009 

20 

.013 

0110 

1 

11 

.007 

10 

.007 

0101 

11 

.007 

11 

.007 

0100 

14 

.009 

9 

.006 

0011 

31 

.021 

56 

.038 

0010 

46 

.031 

55 

.037 

0001 

37 

.025 

64 

.043 

0000 

82 

.055 

53 

.036 

Totals 

1491 

1  .OOCF 

1491 

1.002 

£<X)  "  1  +  <1  +  4<k  ry^i*j‘k+-+  r123 •  •  *n*l*2*3 *  *  **n 


xl-°ti 

/ai(l-a1) 


ru  ■  «*tV 


r«k '  E(liIJ2k> 


r,,..  .•  E(z.,z_z,*  •  *z  )  . 

123 •••n  123  n 

By  dropping  out  terms,  we  get  approximations  to  the  actual  frequencies 
p(x)  and  thus  arrive  at  quantitative  ways  to  register  the  amount  of 
information  less  than  the  total  response  vector  or  more  than  the  sum. 

For  example.  If  joint  normal  distributions  prevail,  then  all  correlations 
higher  than  second  order,  r^^*  etc*»  *r*  ***o;  or  we  could  in  our 

four  item  case  assume  that  some  r^^  and  r^^  are  not  zero,  an  event 
also  tolerated  by  a  log-linear  model. 

For  our  four  item  situation,  the  following  values  were  obtained 


r12 

r13  r14 

r23 

003 

.143  .111 

.180 

.144  .043 

.155 

.Ttj 


a1 

°2 

*3 

°4 

High  IQ 

.821 

.159 

.505 

.436 

Low  IQ 

.801 

.189 

.599 

.530 

The  main  thrust  of  this  study  is  to  provide  a  procedure  for  classi¬ 
fying  a  student  into  the  high  IQ  or  low  IQ  categories  based  on  a  represen¬ 
tation  of  the  joint  distribution  (actual  or  approximated)  of  his  or  her 
four  or  six  responses.  Later,  in  log-linear  model  language,  we  can 
discuss  the  odds  ratios  for  these  events  or  equivalently  the  probability 
of  being  placed  in  one  category  or  the  other  given  a  response  vector. 

He  now  use  a  likelihood  ratio  procedure  for  classifying  an  individual 
in  High  IQ  or  Low  IQ,  l.e. ,  compute 


LW  i(x) 


and  classify  a  student  with  response  vector  x  as  a  member  of  the  high  IQ 
group  if  and  only  if  L(x)  >_  c,  otherwise  classify  as  low  IQ  where  h(x) 
is  the  probability  distribution  of  x  in  high  IQ  group  and  i(x)  is  the 
probability  distribution  of  x  in  low  IQ  group  and  c  is  a  fixed  constant 
equal  to  some  value  of  L(x).  Now  let  a£  denote  the  probability  of 
mlsclasslfying  a  student  from  the  high  IQ  group  into  the  low  IQ  group 
when  c  is  the  cut-off  point,  that  is 


“c  "  l  h(x) 
x:L(x)<c 


and  likewise 


8„  -  l  L(x) 


x:L(x)_>c 
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is  the  probability  of  mlsclassifylng  an  Individual  from  low  IQ  group  into 
high  IQ  group.  For  our  four  item  case,  there  will  be  17  decision 
rules,  the  lta  rule  given  by 


>  c  . 
Ux)  Ci  ‘ 


The  c±  are  determined  in  the  following  way.  Take  h(x)/Z(x)  for  all 
16  response  vectors,  then  realign  them  in  descending  order  according  to 
the  values  of  the  likelihood  ratios.  These  values  provide  the  cut-off 
points  c±  which  lead  to  17  decision  rules,  the  ith  rule  given  by  h(x)/i(x)>c  . 
The  first  rule  is  classify  as  low-IQ,  L,  for  every  response  vector;  the 
second  rule  is  classify  in  L  unless  the  newly  selected  first  response  vector 
is  observed;  the  third  rule  is  classify  in  L  unless  one  of  the  first  two 
newly  selected  response  vectors  is  observed,  etc.;  the  seventeenth  rule  is 
classify  as  high-IQ,  H,  for  every  response  vector.  The  risks  in  using  each 
of  these  17  rules  are  now  computed  by  summing  over  the  appropriate  values 
of  h(x)  and  2.00 ,  which  can  be  obtained  from  the  new  relisting.  Obviously, 
the  curve  is  always  bounded  by  the  straight  line  in  the  a,  8  plane  connecting 
(1,0)  and  (0,1). 

In  Figure  1,  we  see  the  a, 8  curves  for  four  and  six  items.  If 

we  give  equal  weight  to  a  and  8  (the  sample  sizes  for  both  IQ  groups  is 
1491)  the  overall  probability  of  misclassif ication  (y  [a+6])  is 
approximately  .44  when  using  the  total  response  vector  of  four  items, 
approximately  .48  for  the  sum;  approximately  .39  when  using  the  total 
response  vector  of  six  items,  approximately  .42  for  the  sum.  One  could 
say  that  administering  six  items  and  employing  the  sum  is  almost  equivalent 
to  administering  four  items  and  employing  the  total  response  vector. 


The  data  base  given  in  Table  1  provided  an  anchor  for  a  number  of 
investigators  interested  in  the  analysis  of  multivariate  dichotomous  data 
especially  from  a  multi-dimensional  contingency  table  or  log-linear  model 
viewpoint.  Cox  [2]  gives  a  review  of  methods  and  models  for  analyzing 
multivariate  binary  data  and  lists  the  data  in  Table  1  as  an  example.  Also, 
Mattin  and  Bradley  [7]  applied  a  model  based  on  a  set  of  orthogonal  poly¬ 
nomials  to  the  data  in  Table  1  and  Goodman  [4]  discusses  the  data  in 
Table  1  as  a  base  to  examine  methods  for  selecting  models  for  contingency 
tables.  We  now  present  a  procedure  by  Gokhale  and  Rullback  [3]  that  is 
based  on  minimum  discrimination  information  (MDI)  estimation  which  they 
apply  to  the  data  in  Table  1. 

It  may  be  instructive  to  look  at  this  technique  for  the  2x2 
contingency  table. 

Observed  Values  x(lj) 


J-l 

j-2 

x(ll) 

x(12) 

x(l.) 

x(21) 

x(22) 

x(2.) 

x(.l) 

x(.2) 

x(..)-n 

The  dot  is  the  label  indicating  summation  over  the  index  it  replaces, 
that  is  x(l.)  and  x(2.)  are  the  row  marginals  and  x(.l)  and  x(.2)  are 
the  column  marginals.  Under  the  hypothesis  of  independence,  the  cell 
entries  are  estimated  as  a  product  of  marginals,  that  is 
x*(ij)  ■  x(l.)x(.j)/n.  Typically  we  then  use  chi-square  with  one 
degree  of  freedom  to  measure  the  divergence  between  x(ij)  and  x*(ij). 
Note  the  table  of  estimated  values  x*(lj)  has  the  same  marginals  as 
the  observed  table  x(ij).  Now  a  common  statistical  measure  of  the 
association  or  interaction  between  the  variables  of  a  2x2  contingency 
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table  is  the  cross-product  ratio  or  Its  logarithm  which  varies  from  -® 
to  +®  and  Is  zero  when  there  la  no  association. 

For  a  2x2x2  contingency  table  we  have  x(ijk)  and  an  estimate 
under  mutual  Independence  Is 


x*(ijk) 


One  then  asks  whether  these  estimates  or  estimates  under  other  hypotheses 

or  models,  e.g.  first  order  interactions  are  not  zero  and  the  second  order 

interaction  Is  zero,  provide  good  fits.  An  index  to  test  goodness  of  fit 

2 

is  required  and  the  v  test  statistic  with  an  appropriate  number  of 


degrees  of  freedom  emerges. 

From  information  theory  concepts,  Gokhale  and  Kullback  show  that 
a  log  linear  model  results  in  which  the  log  of  the  ratio  of  observed 
to  fitted  cells  or  fitted  cells  for  two  populations  (as  in  our  case) 
is  a  linear  function  of  contingency  cell  parameters  and  that  the 
measure  of  goodness  of  fit  Is  chi-square.  The  cell  estimates  obtained 
in  this  fashion  are  called  minimum  discrimination  information  (MDI) 
estimates  because  the  information  index  employed  measures  divergence 
between  two  distributions  and  this  is  to  be  minimized. 


Gokhale  and  Kullback  treat  the  data  given  in  Table  1  as  a  five 
way  2x2x2x2x2  contingency  table  and  wish  to  apply  their  technique  to 
classification  into  one  of  the  two  multivariate  dichotomous  populations, 
that  is,  the  same  focus  in  the  preceding  discussion  on  classification. 
They  denote  the  original  observations  in  Table  1  by  X(hijkl),  where 
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± 


ex 


IQ 

h 

low  IQ 

high  IQ 

Item  1 

i 

disagree 

agree 

Item  2 

J 

disagree 

agree 

Item  3 

k 

disagree 

agree 

Item  4 

£ 

disagree 

agree 

As  is  typical,  a  study  of  main  effects  and  Interaction  effects  that 
Impact  on  low  IQ  and  high  IQ  discrimination  is  begun  by  examining  the 
hypothesis  that  the  IQ  groupings  are  homogeneous  over  the  16  response 
vectors.  The  MDI  estimate  is  of  the  form  X*(hijk  )  ■  *-*  *-)x(  , 

the  subscript  a  refers  to  this  null  hypothesis  and  the  dot  refers  to 
summation  over  the  index  it  replaces,  and  the  test  statistic 


*  2~~m:x(hHkmog  x(hHk£)-n 
2I(X:V  -  x(h. . . .)x(*ijk£) 


has  a  x2  distribution  with  15  degrees  of  freedom.  This  is  equivalent 
to  the  test  for  homogeneity  in  a  2x16  table.  For  the  data  in  Table  1, 
2I(x:x*)  ■  68.369  and  so  the  hypothesis  of  homogeneity  is  rejected. 

The  next  estimate  x£  includes  the  marginal  x(hi...)  and  corres¬ 
ponds  to  the  model  that  IQ  is  homogeneous  over  the  response  to  the  last 
three  items,  given  the  response  to  the  first  items, that  is 


x^(hijkA) 


x(hi»  •  .)x(-11kJQ 
x( • i* • • ) 


TABLE  2 


OBSERVED  AND  ESTIMATED  SCIENCE  ATTITUDE  DATA 


IjkA 

Observed 
Low  IQ 
x(lijU) 

Estimated 

X*(11jkA) 

Observed 
High  IQ 
x(21jk&) 

Estimated 

x*(21jkA) 

2222 

62 

74.589 

122 

109.414 

2221 

70 

67.296 

68 

70.703 

2212 

31 

31.329 

33 

32.671 

2211 

41 

37.780 

25 

28.219 

2122 

283 

266.570 

329 

345.429 

2121 

253 

259.322 

247 

240.679 

2112 

200 

193.625 

172 

178.376 

2111 

305 

314.491 

217 

207.508 

.6! 


with 


,T .  2£ZSZEx(hHU)log  x(hi1k&)x(-i--0 
“U‘V  x(hi*-.)x(.ijki) 

ft 

and  from  the  data  we  get  2I(x:x^)  “  65.993  with  14  degrees  of  freedom 
and  therefore  significance.  Obviously  more  structure  is  required  to 
get  a  good  fit  (that  is,  2l(x:x  )  is  not  a  significant  X  value). 

We  thus  continue  in  a  sequential  manner  by  examining  additional  hypotheses 
of  main  effects  and  Interactions.  The  MD1  estimates  x*(hijkl)  whose  values 
are  in  Table  2  was  selected  because  21(x:x*)  ■  16.307  with  11  degrees 
or  freedom.  This  estimate  x*  is  symmetric  with  respect  to  the  four 
items,  and  leads  to  the  following  parametric  representation  for  the 
log-odds  (low  IQ/high  IQ)  over  16  response  vectors.  For  example 


io8  tAimiU 

(x*(21111) ] 


Th  +  TM  +  Thj 
T1  T11  T11 


+  T 


hk 

11 


+  T 


hi 

11 


that  is,  f  linear  regression  for  the  log-odds  in  terms- of  an  overall 
average  x^  and  the  main  effects  of  each  component  of  the  response 
vector,  namely  x^  ,  X^,  x^,  x^.  It  is  not  surprising  that  the  main 
effects  alone  lead  to  a  good  fit.  In  our  previous  discussion  on  using 
the  total  response  vector,  we  computed  all  order  interaction  terms 
and  their  magnitudes  were  not  very  large  suggesting  they  need  not  be 
included  to  obtain  a  good  fit  as  measured  by  the  statistic. 

From  the  data  base  In  Table  1,  the  following  values  are  obtained: 


x“  -  -0.3831 


T11  "  -°-2030 


xJJ  -  0.1240 


hk 

x..  - 


xM  - 


0.3411 


0.3338 


From  the  log-odds  representation  above  ve  get 


**<21111)  "  «*PCTj)e*p<TjJ)e*p<T^)erp(T“f)exp(T5J) 


or  the  odds  ratio 


*  A111.11)  .  (.682)(.816)(. 1.132X1. 406X1.396)  -  1.237 
x  (21111) 

leading  to  the  probability  that  for  a  student  who  disagrees  with  all 

1.237 

four  Items,  there  is  a  probability  of  2^237  “  that  he  or  she  belongs 
to  the  low  IQ  category.  From  the  actual  observations,  ve  get  a  prob¬ 
ability  of  82^3  -  .61. 

Since  x(l....)  *  x*(l....)  ■  1491  and  s(2....)a  x  (2....)  “  1491, 
we  can  assign  a  response  vector  (ijki)  to  population  h  »  1  (low  IQ)  or 
to  population  h  -  2  (high  IQ)  depending  on  whether 

log  >  0 

x  <2ijki) 

or 

log  MUJMI  <  o  . 
x  (2ijU) 


The  probability  of  error  of  misclassification  from  this  assignment 
procedure  is 


M 


I 

(ijkl)£2 


x  (lilki) 
1491 


r  x*(2ijU)l 
(ijki)el  1491  / 
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This  probability  of  misclassif ication  error  Is  0.444.  If  we  employ  x 
instead  of  x*,  the  probability  is  0.441.  Note  that  in  our  prior 
discussion  of  classification  employing  the  total  response  vector  and 
the  likelihood  ratio  criterion,  the  overall  probability  of  misclassif i- 
cation  error  was  approximately  0.44,  essentially,  the  same  as  it  should 
be  if  the  log-linear  model  employed  is  a  good  fit. 

One  Important  result  of  the  use  of  the  log-linear  model  to  analyze 
attitudes  toward  science  and  IQ  in  high  school  seniors  is  the  fact  that 
odds  ratios  and  consequently  probabilities  of  events  can  be  computed 
directly.  For  the  classification  problem  where  an  assignment  procedure 
is  developed,  the  probabilities  of  misclasslfiation  can  be  computed  but 
these  are  not  as  satisfying  as  say,  the  probability  that  a  particular 
student  is  in  the  high  IQ  or  low  IQ  category  given  a  specific  four  item 
response  on  the  questionnaire.  It  was  this  unconditional  probability 
of  an  event  that  was  desired  in  the  1950 's  but  one  settled  for  condi¬ 
tional  probabilities  within  the  classification  framework  and  even  there 
computer  characteristics  provided  limitations. 

Parole  Outcome. 

Let  us  now  continue  the  use  of  the  log— linear  model  on  another  data 
base  where  more  than  two  category  levels  are  permitted  for  some  variables. 
In  the  early  1970's  the  US  Parole  Board  was  concerned  with  parole  decision 
making.  It  would  like  parole  to  be  granted  a  prisoner  who  would  not 
violate  parole  conditions  or  be  sentenced  again  at  least  for  a  two 
year  period  after  release  from  prison.  Since  about  thirty  percent  of 
those  placed  on  parole  are  recidivists  (return  within  two  years),  the 
Board  was  determined  to  learn  what  variables,  if  any,  had  an  impact  on 
recidivism  and  also  the  magnitude  of  the  impact. 
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A  study  to  accomplish  this  was  Initiated  based  on  the  records  of 
approximately  2500  parolees  (actually  2497)  from  federal  prisons.  For 
this  group,  754  or  about  30Z  were  recidivists.  A  large  number  of  variables 
were  examined  In  connection  with  their  association  on  recidivism  and  from 
these  nine  were  selected,  essentially  by  linear  regression  methods,  to 
produce  a  score  (linear  sum)  that  would  provide  Information  on  future 
failure  or  success  in  parole  violation.  Seven  of  these  nine  items  and 
their  levels  are: 

VARIABLES  AND  CATEGORY  LEVELS:  2497  PAROLEES 


1 

2 

3 

Prior  Convictions: 

none 

one  or  two 

three  or  more 

338 

609 

1550 

Prior  Incarcerations: 

none 

one  or  two 

three  or  more 

779 

726 

992 

Age  at  First  Commitment: 

18  or  over 
1503 

under  18 

994 

Commitment  Offense: 

auto  theft 
796 

otherwise 

1701 

Prior  Parole: 

no  parole 
1752 

otherwise 

745 

Drug  History: 

no  hard  drugs 
1987 

otherwise 

510 

Release  Plan: 

with  spouse 
491 

otherwise 

2006 

Parole  Outcome: 

success 

1742 

failure 

754 

Information  on  parole  outcome  is  listed  above  with  the  seven  items; 
the  two  items  not  appearing  above  are:  employment  record  and 
educational  level  of  prisoner. 


In  •  study  of  this  data  by  Solomon  [91,  ‘  four  five-way  contingency 
tables  of  variable  factors  believed  to  affect  the  outcome  of  parole  vere 
investigated.  Of  the  four  tables  studied;  one  led  to  an  estimated  table 
relating  the  effects  of  four  explanatory  variables  on  parole  outcome  that 
was  analyzed  in  detail.  This  estimate  is  based  on  a  simple  additive  log- 
linear  model,  just  as  in  the  previous  illustration  on  science  attitude  and 
IQ,  but  it  accounts  for  a  very  high  percentage  of  the  total  variation  in 
parole  outcome.  The  four  explanatory  variables  in  this  case  are:  (1)  number 
of  prior  convictions;  notation  H  and  index  h;  (11)  prior  parole;  notation 
J  and  index  j;  (ill)  commitment  offense;  notation  K  and  index  k;  (iv)  release 
plan;  notation  M  and  index  m  -  the  parole  outcome  variable  has  notation  N 
and  index  N. 

The  log  odds  representation  that  gives  a  good  fit  (93. 2Z  explanation 
of  the  total  variation)  is 


log  SVW  -  T*  +  T*"  ♦  tf  +  Tf 

x  (hJVm2)  J 


+  T 


that  is  only  the  main  effects  of  the  four  variables  are  required  plus 
tN  which  is  a  general  or  unconditional  measure  of  parole  outcomes. 

In  Table  2  we  see  observed  and  estimated  parole  outcomes  employing 
the  four  way  contingency  table  depicted  in  that  listing.  The  estimate® 
outcomes  for  each  of  the  24  patterns  are  close  to  the  observed  outcomes. 
This  now  permits  the  construction  of  Table  3  which  shows  for  each 
pattern  of  item  responses,  the  odds  of  parole  success. 

The  probability  of  parole  success  for  each  pattern  can  then  be  easily 
obtained.  For  example,  a  prisoner  with  no  prior  convictions  (level  1)  and 


TABLE  3 


Number  of  Prior 
Convictions 

Parole 

Commitment 

Offense 

Release 

Plan 

Odds 

1 

1 

2 

1 

15.727 

1 

1 

1 

1 

10.108 

1 

2 

2 

1 

9.840 

2 

1 

2 

1 

7.425 

1 

1 

2 

2 

7.279 

1 

2 

1 

1 

6.324. 

3 

1 

2 

1 

4.810 

2 

1 

1 

1 

4.772 

1 

1 

1 

2 

4.679 

2 

2 

2 

1 

4.645 

...  1 

2 

2 

2 

4.554 

2 

1 

2 

2 

3.437 

3 

1 

1 

1 

3.092 

3 

2 

2 

1 

3.010 

2 

2 

1 

1 

2.986 

1 

2 

1 

2 

2.927 

3 

1 

2 

2 

2.226 

2 

1 

1 

2 

2.209 

no  prior  parol*  (l*v«l  1)  who  i*  not  in  prison  for  auto  theft  (level  2) 

and  has  a  release  plan  with  spousa  (level  1)  has  odds  of  parole  success 

equal  to  15.73  and  a  probability  of  parol*  success  equal  to 
15  73 

*16'! 73  "  *94*  wh*r**»  »  prisoner  with  a  pattern  (3,2,1,2)  has  odds  of 

90 

parol*  success  equal  to  -  .47.  Recall  that  the  observed  odds  of 

success,  not  conditioning  on  any  explanatory  variable  is  2.31  leading 
to  a  probability  of  success  equal  to  -  .70.  Note  that  a  prisoner 

with  a  (3 ,1,1,1)  pattern,  that  is  one  with  a  large  number  of  prior 
convictions  but  not  too  bad  on  the  other  variables  has  an  odds  success 
ratio  -  3.092  and  a  probability  of  success  equal  to  -  .76. 

Thus  those  involved  in  parole  decision  making  can  get  a  quick  idea 
of  the  probability  of  parole  recidivism  by  responses  to  four  items. 

This  is  even  shorter  than  the  nine  item  questionnaire  obtained  from 
regression  analyses  that  leads  to  sums  ranging  between  zero  and  eleven 
and  moreover  directly  leads  to  a  probability  of  an  event  of  interest 
(parole  success) .  In  these  ways  the  log-linear  model  far  surpasses 
the  regression  or  discriminant  type  analyses  employed  previously. 

Towaway  Accidents  and  Injuries. 

Data  was  collected  on  approximately  3200  towaway  accidents  in  the 
early  1970's,  that  is,  autos  in  accidents  that  required  towing  to 
repair  shops,  see  Solomon  [10].  This  accident  data  led  to  the  following 
kind  of  multidimensional  contingency  table  that  included  the  following 
categorized  variable. 


(1)  Severity  of  injury;  categorized  as  either  minor  or  none,  or 
moderate  or  worse;  (2)  Auto  occupant  restraint  system  used;  none,  lap 


To  determine  the  non-zero  effects  the  following  hypotheses  were 
tested: 

H^!  model  Includes  only  mein  effects  (X^*»  X^D,  A*X) 

HjJ  model  includes  mein  effects  plus  Interaction  between  restraint 
used  end  general  area  of  damage  (X*R,  Ad®,  A*^®) 

H^:  model  includes  main  effects  plus  interaction  between  restraint 
used  and  extent  of  first  Impact  (X^,  xJD,  AxX,  X^*) 

H^:  model  Includes  main  effects  plus  interaction  between  general 
area  of  damage  and  extent  of  first  Impact  (X*R,  Ad®,  X^,X^X) 
model  includes  main  effects  plus  two  interaction  terms,  the 
Interaction  between  restraint  used  and  extent  of  first  impact 


and  the  interaction  between  area  damaged  and  extent  of  first 

('  IR  11D  11X  1»X\ 
impact  (/.r  ,  Xd  ,  Ax  ,  Arx, 

Hgt  model  Includes  two  main  effects  and  one  interaction,  the  effect 
due  to  general  area  of  damage,  the  effect  due  to  extent  of  first 
impact,  and  the  interaction  between  them  (Xd°,  XxX,  X^X). 


The  following  table  indicates  for  each  hypothesis  the  likelihood  ratio 
statistic,  the  degrees  of  freedom,  and  the  P-value  associated  with  the 
likelihood  ratio  statistic. 


D.F. 


P-value 


Coder  we  get  «  significant  likelihood  ratio  value  so  that  we  reject 
H^:  Model  Includes  only  sain  effects.  Therefore  we  Bust  add  an  inter¬ 
action  tern  to  the  sain  effects.  The  three  possible  ways  we  could  add 
an  interaction  are  given  by  Hj,  Hj,  and  H^.  All  three  hypotheses  are 
quite  an  improvement  over  (as  evidenced  by  the  large  decrease  in  the 
likelihood  ratio  value)  but  the  best  one  by  far  is  H^,  the  addition  of 
the  interaction  of  general  area  of  damage  and  extent  of  impact. 

Since  we  get  a  likelihood  ratio  of  64  with  a  P -value  of  .2203,  we 
accept  hypothesis  E^.  This  will  be  the  model  under  which  we  estimate 
the  effects.  It  accounts  for  971  of  the  total  variation  in  the  original 
data. 

Thus  for  the  3209  towaway  accident  cases  in  the  study,  the  pre¬ 
dicted  odds  ratio  for  major  to  minor  injury  is 


XX  + 


where  the  numerical  estimates  of  the  parameters  lead  to  odds  ratios  and 
probabilities.  The  subscript  values  for  X  relate  to  the  categories 
given  in  the  beginning  of  this  section  for  I,  R,  D,  X.  Note  that  for 
this  base,  an  interaction  term,  namely,  damage  and  impact  is  indicated 
along  with  main  effects  to  get  a  good  fit. 

Table  4  of  of  odds  ratios  and  probabilities  for  minor  Injuries  and 


worse  injuries  is  now  displayed.  These  are  the  operational  results  of 
the  contingency  table  analysis  on  the  towaway  accident  data. 


TABLE  4 


ODDS  OF  MORSE  INJURY  TO  MINOR  INJURY  GIVEN  THE  TYPE  OF  RESTRAINT* 
USED,  FIRST  GENERAL  AREA  OF  DAMAGE,  AND  EXTENT  OF  IMPACT  ] 


FIRST  GENERAL  AREA  TYPE  OF  EXTENT  OF  IMPACT 

OF  DAMAGE  RESTRAINT 


1  1 

2 

3  OR  4 

NO  RESTRAINT 

UNCLASS I FI ABLE  LAP  ONLY 

LAP  AND  TORSO 

■ 

0.5874 

0.3775 

0.2565 

1 .2924 
0.8305 
0.5644 

NO  RESTRAINT 

TOP  OR  BOTTOM  UP  ONLY 

UP  AND  TORSO 

0.1958 

0.1258 

0.0855 

0.7754 

0.4983 

0.3386 

0.8285 

0.5323 

0.3618 

NO  RESTRAINT 

SIDE  UP  ONLY 

LAP  AND  TORSO 

n 

JjfT! 

0.3175 

0.2040 

0.1386 

NO  RESTRAINT 

FRONT  UP  ONLY 

LAP  AND  TORSO 

m 

0.1952 

0.1254 

0.0852 

0.5643 

0.3626 

0.2464 

NO  RESTRAINT 

REAR  UP  ONLY 

UP  AND  TORSO 

m 

VlTc "  jj| 

0.1119 

0.0719 

0.0488 

0.0723 

0.0465 

0.0316 

PROBABILITY  OF  A  WORSE  INJURY,  GIVEN  THE  TYPE  OF  RESTRAINT  USED, 
FIRST  GENERAL  AREA  OF  DAMAGE,  AND  EXTENT  OF  IMPACT 


FIRST  GENERAL  AREA  TYPE  OF  EXTENT  OF  IMPACT 

OF  DAMAGE  RESTRAINT 


1  1 

2 

3  OR  4 

NO  RESTRAINT 

UNCUSSIFIABLE  UP  ONLY 

LAP  AND  TORSO 

m 

0.3700 

0.2740 

0.2041 

0.5637 

0.4537 

0.3607 

NO  RESTRAINT 

TOP  OR  BOTTOM  LAP  ONLY 

LAP  AND  TORSO 

0.1637 

0.1117 

0.0787 

0.4367 

0.3325 

0.2529 

0.4531 

0.3474 

0.2656 

NO  RESTRAINT 

SIDE  UP  ONLY 

LAP  AND  TORSO 

0.0928 

0.0617 

0.0427 

0.0934 

0.0621 

0.0430 

0.2410 

0.1694 

0.1217 

NO  RESTRAINT 

FRONT  UP  ONLY 

LAP  AND  TORSO 

mwmm 

1 

mu 

0.3607 

0.2661 

0.1977 

NO  RESTRAINT 

REAR  LAP  ONLY 

LAP  AND  TORSO 

0.0481 

0.0314 

0.0215 

teii 

0.0674 

0.0444 

0.0306 

Marine  Reelnlistment . 


Another  data  base  and  the  last  to  be  discussed  in  this  paper  results 
from  a  study  of  Marine  Corps  reenlistment.  Longitudinal  data  existed  for 
about  10,000  Marines  who  enlisted  in  1968.  By  1972,  information  was  then 
available  on  those  who  reenlisted  and  those  who  did  not  after  service 
terms  of  1,  2,  3,  or  4  years.  This  presents  an  opportunity  to  examine 
the  variables  that  eight  be  associated  with  reenlistment  and  the  impact 
of  those  variables  on  reenlistment.  At  first  a  large  number  of  variables 
can  be  introduced  that  could  possibly  Impact  on  reenlistment  decisions, 
e.g.  rank,  military  pay,  number  of  dependents,  length  of  enlistment, 
educational  level,  IQ,  age,  race,  and  area  of  the  United  States  from 
which  enlistment  was  accomplished.  Some  additional  variables  could  be 
time  before  reenlistnent  decision  at  which  rank  is  achieved,  service  in 
Vietnam  or  not,  reenlistment  bonus.  Some  of  the  more  Important  variables 
turned  out  to  be  rank,  length  of  enlistment,  number  of  dependents,  current 
primary  job  in  the  service,  region  of  country  from  which  Marine  came,  and 
IQ  group.  Some  variables  of  a  bit  lesser  Importance  turned  out  to  be 
educational  level,  Vietnam  service  or  not,  and  time  at  which  rank  is 
achieved.  Such  variables  as  race  and  age  at  enlistment  turned  out  not 
to  be  Important,  but  of  course  these  could  be  associated  with  some  of  the 
important  or  somewhat  important  variables. 

The  overall  reenlistment  rate  was  .07.  We  would  now  like  to  examine 
profiles  of  Marines  that  might  lead  to  much  larger  probabilities  of 
reenlistment,  and  of  course,  those  that  would  even  give  a  smaller  prob¬ 
ability  of  reenlistnent.  In  order  to  do  this  we  once  again  rely  on  a 
on  a  log-linear  model  to  obtain  the  log  odds  ratio  of  reenlistment  to 
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nonreenlistment.  When  this  Is  accomplished  employing  some  of  the  more 
important  variables  we  can  offer  some  interesting  results.  For  example 
we  can  write 


Pii.  T  TIT 

in  — —  ■  3  +  8,  ,  k  -  1  for  high  rank,  k  -  2  for  low  rank 

p2k  * 

In  this  simple  statement  the  log  odds  of  reenlistment  to  non-reenlistment 

T  TK 

are  seen  to  depend  on  3  ,  the  general  mean  for  the  log  odds  and  3^  , 

the  association  between  rank  and  reenlistment  decision. 

To  further  illustrate  what  we  now  develop,  consider  another  example. 

Assume  that  reenlistment  is  a  function  of  two  variables;  length  of 

enlistment,  L,  and  the  presence  or  absence  of  dependents,  D.  Then  Pt^ 

represents  the  probability  that  a  specified  reenlistment  decision  occurs 

given  an  individual's  length  of  enlistment  and  dependency  status.  As 

before,  the  logarithm  of  the  odds  of  reenlistment  to  not  reenlisting 

can  be  written  as 


In 


plid 

p2id 


BT  + 


uTL 


+  3 


TLD 
id  * 


Note  that  here  we  are  allowing  for  the  interaction  of  length  of  enlist¬ 
ment  and  dependency  status. 

We  have  already  mentioned  that  the  unconditional  odds  are  .074 
to  one  in  favor  of  reenlistment.  If  we  now  condition  on  Marines  who 
enlist  for  four  years  the  odds  of  reenlistment  increase  from  .074 


TABLE  5 


ODDS  OF  REENLISTMENT  AND  PROBABILITY  OF  REENLISTMENT 


Odds  of 
Reenllstment 


Probability  of 
Reenllstment 


Length  of  Enlistment 


Two  Years 

.041 

.04 

Three  Years 

.055 

.05 

Four  Years 

.182 

.15 

Race 

Nhlte 

.047 

.04 

Non-white 

.117 

.10 

Military  Occuoatlon 

Ground  combat 

.056 

.05 

Clerical  and  related 

.073 

.07 

Other 

.084 

.08 

General  repair 

.087 

.08 

Region 

East 

.068 

.06 

North 

.075 

.07 

South 

.096 

.09 

West 

.130 

.12 

Education 

High  School  or  above 

.061 

.06 

Less  than  High  School 

.090 

.08 

Combat 

In  combat  (Vietnam) 

.076 

.07 

Not  In  combat 

.104 

.09 
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TABLE  6 


OOOS  OF  REENLISTMENT  AND  PROBABILITY  OF  REENLISTMENT 


A.  Odds  of  Reenlistment  by 
Length  of  Enlistment 


Rank 

El,  E2 

E3 

E4 

E5  and  above 
Dependents 
None 

One  or  more 

Current-Primary  Job 

Same  Current- Primary  Job 
Different  Current- Primary  Job 


Rank 

El,  E2 

E3 

E4 

E5  and  above 
Dependents 
None 

One  or  more 

Current- Primary  Job 

Same  Current-Primary  Job 
Different  Current-Primary  Job 


Length  of  Enlistment  (In  years! 


.014 

.018 

.079 

.019 

.023 

.095 

.037 

.042 

.266 

*•  „ 

.274 

.258 

.514 

.030 

.037 

.072 

wt  ■*» 

.056 

.080 

.457 

IK  .  < 

.028 

.054 

.181 

.089 

.087 

.239 

B.  Probability 

of  Reenllstment 

by  Length  of  Enlistment 

*,* 

Length  of  Enlistment  (In  years) 

IT-*- - * 

2 

3 

4 

r-Tr 

.01 

.02 

.07 

.02 

.02 

.09 

.04 

.04 

.21 

*"  '  ■ 

.22 

.21 

.34 

.03 

.04 

.07 

‘••.I." 

.05 

.07 

.32 

— ^ — 

.03 

.05 

.15 

m  -■ . 

.08 

.08 

.19 

to  .182.  Another  result  froa  this  study  shows  that  the  odds  of  reen¬ 
listment  for  four  years  enlistees  with  one  or  more  dependents  equals 
.457  to  one.  If  we  omit  the  Interaction  term  and  thereby  use  only  the 
main  effects  we  would  get  a  substantial  underestimating,  namely,  .313 
to  one.  If  we  continue  in  this  way  all  kinds  of  odds  or  reenlistment 
can  be  developed  as  a  function  of  predictor  variables.  Tables  5  and  6 
indicate  some  of  these  odds  as  well  as  probability  of  reenlistment  for 
various  profiles.  Many  more  reenlistment  results  from  this  large 
scale  study  can  be  obtained  from  Solomon,  Haber  and  Ireland  [11]. 
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